home *** CD-ROM | disk | FTP | other *** search
- Internet Engineering Task Force Audio-Video Transport Working Group
- INTERNET-DRAFT H. Schulzrinne
- draft-ietf-avt-profile-03.txt AT&T Bell Laboratories
- October 20, 1993
- Expires: 12/31/93
-
- Sample Profile and Encodings for the Use of RTP for Audio and Video
- Conferences with Minimal Control
-
-
- Status of this Memo
-
-
- This document is an Internet Draft. Internet Drafts are working documents
- of the Internet Engineering Task Force (IETF), its Areas, and its Working
- Groups. Note that other groups may also distribute working documents as
- Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six months.
- Internet Drafts may be updated, replaced, or obsoleted by other documents
- at any time. It is not appropriate to use Internet Drafts as reference
- material or to cite them other than as a ``working draft'' or ``work in
- progress.''
-
- Please check the I-D abstract listing contained in each Internet Draft
- directory to learn the current status of this or any other Internet Draft.
-
- Distribution of this document is unlimited.
-
-
- Abstract
-
- This note describes a profile for the use of the real-time
- transport protocol (RTP) and the associated control protocol, RTCP,
- within audio and video multiparticipant conferences with minimal
- control. It provides interpretations of generic fields within the
- RTP specification suitable for audio and video conferences. In
- particular, this document defines a set of default mappings from
- format index to encodings.
- The document also describes how audio and video data may be
- carried within RTP. It defines a set of standard encodings and
- their names when used within RTP. However, the definitions are
- independent of the particular transport mechanism used. The
- descriptions provide pointers to reference implementations and
- the detailed standards. This document is meant as an aid
- for implementors of audio, video and other real-time multimedia
- applications.
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- Contents
-
-
- 1 Introduction 2
-
- 2 Demultiplexing 3
-
- 3 Audio 3
-
- 3.1 Encoding-independent recommendations . . . . . . . . . . . . . . . 3
-
- 3.2 Recommended Audio Encodings. . . . . . . . . . . . . . . . . . . . 4
-
- 3.3 The RTCP FMT Option for Audio. . . . . . . . . . . . . . . . . . . 6
-
- 3.4 Port Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . 7
-
- 4 Video 8
-
- 4.1 The RTCP FMT Option for Video. . . . . . . . . . . . . . . . . . . 9
-
- 4.2 Port Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . 9
-
- 5 Miscellaneous 10
-
- 6 Address of Author 10
-
-
- 1 Introduction
-
-
- This profile defines aspects of RTP left unspecified in the RTP protocol
- definition (RFC TBD). This profile is intended for the use within audio and
- video conferences with minimal session control. In particular, no support
- for the negotiation of parameters or membership control is provided. Other
- profiles may make different choices for the items specified here. The
- profile specifies the use of RTP over unicast and multicast UDP as well
- as ST-II. For unicast UDP and ST-II, references to multicast addresses
- are to be ignored. The use of this profile is indicated by the use of
- a media-specific well-known port number. The profile may also be used
- with other port numbers. For example, the use of a particular session
- announcement tool could imply use of this profile.
-
-
-
-
-
-
-
- H. Schulzrinne Expires 12/31/93 [Page 2]
- internet-dRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- 2 Demultiplexing
-
-
- For applications which choose to share a single network destination address
- and port for both audio and video, the default channel identifier for audio
- is 0 and for video is 1. In that case, the port number for audio is used.
- This combination should only be used when it is known that all receiving
- applications can properly demultiplex audio and video.
-
-
- 3 Audio
-
-
- 3.1 Encoding-independent recommendations
-
-
- The following recommendations are default operating parameters. Ap-
- plications should be prepared to handle other values. The ranges
- given are meant to give guidance to application writers, allowing a set
- of applications conforming to these guidelines to interoperate without
- additional negotiation. These guidelines are not intended to restrict
- operating parameters for applications that can negotiate a set of
- interoperable parameters, e.g., through a conference control protocol.
-
- For packetized audio, the default packetization interval should have a
- duration of 20 ms, unless otherwise noted in Table 1. The packetization
- interval determines the minimum end-to-end delay; longer packets introduce
- less header overhead but higher delay and make packet loss more noticeable.
- For non-interactive applications such as lectures or links with severe
- bandwidth constraints, a higher packetization delay may be appropriate. For
- frame-based encodings (marked as F in the table 1 below) such as LPC, CELP
- and GSM, the sender may choose to combine several frame intervals into a
- single message. The receiver can tell the number of frames contained in a
- message since the frame duration is defined as part of the encoding.
-
- If multiple channels are used, the left channel information always precedes
- the right-channel information. For more than two channels, the convention
- followed by the AIFF-C audio interchange format should be followed. (The
- AIFF-C specification is available by anonymous ftp at ftp.sgi.com in the
- file sgi/aiff-c.9.26.91.ps.) For two-channel stereo, the sequence is left,
- right; for three channels, left, right, center; for quadrophonic systems,
- front left, front right, rear left, rear right; for four-channel systems,
- left, center, right, and surround sound; for six-channel systems left, left
- center, center, right, right center and surround sound.
-
- The sampling frequency should be drawn from the set: 8, 11.025, 16, 22.05,
- 44.1 and 48 kHz.
-
-
-
-
-
- H. Schulzrinne Expires 12/31/93 [Page 3]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- 3.2 Recommended Audio Encodings
-
-
- The table 1 shows the names, types (sample vs. frame oriented), per-channel
- bit rates and default sampling frequencies of recommended encodings. The
- list is partially drawn from the document "Recommended practices for
- enhancing digital audio compatibility in multimedia systems", published by
- the Interactive Multimedia Assocation, Version 3.00, Oct. 1992 (referenced
- as [IMA]). The names are for identification only; they correspond to the
- names used within the Real-Time Transport Protocol (RTP). Other applications
- may choose different namings. Note that the L16 encoding may be used with
- different sampling rates. The CCITT changed its name in 1993 to ITU-T; to
- limit confusion, both old and new name are used.
-
-
- name nom. sampling rate type frame description
- kHz kb/s S/F ms
- _________________________________________________________________________
- L16 48 768 S 16-bit linear, 2's complement
- L16 44.1 705.6 S
- L16 22.05 352.8 S
- L16 11.025 176.4 S
- G722 16 64 S CCITT/ITU-T subband ADPCM
- PCMU 8 64 S CCITT/ITU-T mu-law PCM
- PCMA 8 64 S CCITT/ITU-T A-law PCM
- G721 8 32 S CCITT/ITU-T ADPCM
- IDVI 8 32 S Intel/DVI ADPCM [IMA]
- G723 8 24 S CCITT/ITU-T ADPCM
- GSM 8 13 F 20 RTE/LTP GSM 06.10
- 1016 8 4.8 F 30 CELP
- _________________________________________________________________________
-
- Table 1: Audio encodings
-
- For multi-octet encodings, octets are transmitted in network byte order
- (i.e., most significant octet first).
-
- A detailed description of the encodings is given below. The names shown
- (L16, PCMU, etc.) are limited to four characters and suitable to be used
- for identification in protocols such as RTP (RFC TBD).
-
-
- L16: denotes uncompressed audio data, using 16-bit signed representation
- with 65535 equally divided steps between minimum and maximum signal
- level, ranging from -32768 to 32767. The value is represented in two's
- complement notation.
-
- PCMU: specified in CCITT/ITU-T recommendation G.711. Audio data is encoded
- as eight bits per sample, after companding. Code to convert between
- linear and mu-law companded data is available in the IMA document.
-
- PCMA: specified in CCITT/ITU-T recommendation G.711. Audio data is encoded
-
- H. Schulzrinne Expires 12/31/93 [Page 4]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- as eight bits per sample, after companding. Code to convert between
- linear and A-law companded data is available in the IMA document.
-
- G721 through G729: specified in the corresponding CCITT/ITU-T recommenda-
- tions. Reference implementations for G.721 and G.723 are available
- as part of the CCITT/ITU-T Software Tool Library (STL) from the
- ITU General Secretariat, Sales Service, Place du Nations, CH-1211
- Geneve 20, Switzerland. The library is covered by a license
- and is available for anonymous ftp on gaia.cs.umass.edu, file
- pub/ccitt/ccitt_tools.tar.Z.
-
- GSM: (group speciale mobile) denotes the European GSM 06.10 provisional
- standard for full-rate speech transcoding, prI-ETS 300 036, which
- is based on RPE/LTP (residual pulse excitation/long term prediction)
- coding at a rate of 13 kb/s. A reference implementation was written by
- Carsten Borman and Jutta Degener (TU Berlin, Germany) and is available
- for anonymous ftp from tub.cs.tu-berlin.de, directory tub/tubmik.
-
- 1016: uses code-excited linear prediction (CELP) and is specified in
- Federal Standard FED-STD 1016, published by the Office of Technology
- and Standards, Washington, DC 20305-2010.
-
- The U. S. DoD's Federal-Standard-1016 based 4800 bps code excited
- linear prediction voice coder version 3.2 (CELP 3.2) Fortran and
- C simulation source codes are available for worldwide distribution
- at no charge (on DOS diskettes, but configured to compile on Sun
- SPARC stations) from: Bob Fenichel, National Communications System,
- Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960.
-
- Example input and processed speech files, a technical information
- bulletin, and the official standard "Federal Standard 1016, Telecom-
- munications: Analog to Digital Conversion of Radio Voice by 4,800
- bit/second Code Excited Linear Prediction (CELP)" are included at no
- charge. According to Vincent Cate (Carnegie Mellon), the distribution
- is also available for anonymous ftp at furmint.nectar.cs.cmu.edu
- (128.2.209.111) in directory celp.audio.compression.
-
- The following articles describes the Federal-Standard-1016 4.8-kbps
- CELP coder:
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
- Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
- Technology Magazine, April/May 1990, p. 58-64.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
- Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal
- Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
- DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in Advances
- in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic
-
- H. Schulzrinne Expires 12/31/93 [Page 5]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- Publishers, 1991, Chapter 12, p. 121-133.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
- Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
- Technology Magazine, April/May 1990, p. 58-64.
-
- Copies of the FS-1016 document are available for $2.50 each from:
-
-
- GSA Rm 6654
- 7th & D St SW
- Washington, D.C. 20407
- 1-202-708-9205
-
-
- DVI: is specified in the "Recommended Practices for Enhancing Digital Audio
- Compatibility in Multimedia Systems", published by the Interactive
- Multimedia Association (IMA), Annapolis, MD. The document also contains
- reference implementations for mu-law to 16-bit, ADPCM and sample rate
- conversions.
-
-
- For sample-based encodings, a receiver should accept packets representing
- between 0 and 200 ms of audio data.(1) Receivers should be prepared to
- accept multi-channel audio, but may choose to only play a single channel.
-
- All block-oriented audio codecs should be able to encode and decode several
- consecutive blocks within a single packet. Since the frame size for
- the block-oriented codecs is given, there is no need to use a separate
- designation for the same encoding, but with different number of blocks per
- packet.
-
-
- 3.3 The RTCP FMT Option for Audio
-
-
- Unless specified with the FMT option, the mapping between the format field
- in an RTP packet and audio encodings, sampling rates and channel counts is
- specified by Tables 2.
-
- Format values of 31 and below cannot be redefined by FMT options. In other
- words, only values of 32 and above are valid in the format field within an
- FMT option. The receiver is expected to discard RTP packets containing
- media data with unknown format field values. Sites are expected to keep
- the mapping between format and encoding constant, so that lost packets
- containing FMT options do not lead the receiver to misinterpret media data.
- Additional standard encodings may be registered with the Internet Assigned
- ------------------------------
- 1. This restriction allows reasonable buffer sizing for the receiver.
-
-
- H. Schulzrinne Expires 12/31/93 [Page 6]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- Numbers Authority (IANA). The format name is intended to describe the format
- in an unambiguous way; it is interpreted as a sequence of four ASCII
- characters, with uppercase and lowercase characters treated as distinct.
- Format names beginning with the letter 'X' are reserved for experimental use
- and not subject to registration. These experimental encodings may be mapped
- to format values 32 and above using the FMT option. Additional standard
- mappings to format values of 31 and below may also be registered with IANA.
- Registered assignments are published periodically in the Assigned Numbers
- RFC.
-
- Within the FMT option, the format name is followed by a field containing a
- channel count and a sample rate field, measured in samples per second.(2) A
- channel count of zero is considered invalid. A packetization interval of 20
- ms or a multiple thereof is suggested as it leads to integral sample counts
- for all common sampling rates.
-
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |F| FMT | length |0|0| format | reserved |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | name of format |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | channels | sampling rate (Hz) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- ... encoding specific parameters ...
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
-
- Figure 1: FMT option for audio encodings
-
-
- 3.4 Port Assignment
-
-
- ST-II SAP and UDP port 5005 is the default destination for multicast
- real-time audio data carried by RTP for this profile.
-
- A fixed port number is useful as it is less likely than a randomly chosen
- port number to be already in use by another application at one or more of
- the intended destination hosts. Also, fixed port numbers allow traffic
- statistics to be collected and may simplify firewall implementations. A
- single fixed port number requires that hosts allow several processes to use
- a single UDP port with different multicast addresses. (The particular port
- number was chosen to lie in the range above 5000 to accomodate port number
- ------------------------------
- 2. Fractional samples per second was considered excessive as the typical
- crystal accuraccy of 100 ppm translates into about one Hz or more of
- sampling rate inaccuracy.
-
- H. Schulzrinne Expires 12/31/93 [Page 7]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
-
-
- index encoding sampling rate channels
- name (kHz)
- __________________________________________
- 0 PCMU 8 1
- 1 1016 8 1
- 2 G721 8 1
- 3 GSM 8 1
- 4 G723 8 1
- 5 IDVI 8 1
- 10 L16 44.1 2
- __________________________________________
-
- Table 2: Standard audio encodings
-
- allocation practice within the Unix operating system, where port numbers
- below 1024 can only be used by privileged processes and port numbers between
- 1024 and 5000 are automatically assigned by the operating system.)
-
- Unicast connections may use the this or a set of mutually agreed-upon port
- numbers.
-
-
- 4 Video
-
-
- The following video encodings are currently defined, with their abbreviated
- names used for identification:
-
-
- CPV: This encoding, "Compressed Packet Video" is implemented by Concept,
- Bolter, and ViewPoint Systems video codecs.
-
- JPEG: The encoding is specified in ISO Standards DIS 10918-1 and DIS
- 10918-2. The data is formatted according to the JFIF (JPEG File
- Interchange Format) defined by C-Cube Microsystems.
-
- H261: The encoding is specified in CCITT/ITU-T standard H.261. The
- packetization and RTP-specific properties are described in RFC TBD.
-
- nv: The encoding is implemented in the program 'nv' developed at Xerox PARC
- by Ron Frederick.
-
- CUSM: The encoding is implemented in the program CU-SeeMe developed at
- Cornell University by Dick Cogger, Scott Brim, Tim Dorcey and John
- Lynn.
-
- PicW: The encoding is implemented in the program PictureWindow developed at
- Bolt, Beranek and Newman (BBN).
-
-
-
- H. Schulzrinne Expires 12/31/93 [Page 8]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- 4.1 The RTCP FMT Option for Video
-
-
- Unless specified with the RTCP FMT option, the mapping between the format
- field in an RTP packet and the video encoding is specified by Tables 3. The
- second paragraph of Section 3.3 applies for video as well.
-
- Within the video FMT option, a one-octet numeric version identifier further
- describes the encoding. Unless otherwise defined, the version identifier
- has the value zero.
-
- 0 1 2 3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- |F| FMT | length |0|0| format | reserved |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | name of format |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | version | encoding-specific parameters |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- ... encoding-specific parameters ...
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
-
- Figure 2: FMT option for video encodings
-
-
- number name
- ______________
- 26 JPEG
- 27 CUSM
- 28 nv
- 29 PicW
- 30 Bolt
- 31 H261
-
-
- Table 3: Format values for standard video encodings
-
-
- 4.2 Port Assignment
-
-
- ST-II SAP and UDP port 5006 is the default destination for multicast
- real-time video data carried by RTP for this profile. The remainder of
- section 3.4 applies.
-
-
-
-
-
-
- H. Schulzrinne Expires 12/31/93 [Page 9]
- INTERNET-DRAFT draft-ietf-avt-profile-03.txt October 20, 1993
-
- 5 Miscellaneous
-
-
- RTCP messages should be sent periodically, with a period varying randomly
- around a set mean to avoid synchronized bursts of RTCP packets. (For
- example, the time between messages could vary uniformly between one half and
- 1.5 times the mean.) The average period between transmissions determines
- the additional network load due to RTCP packets and also determines how
- long it will take a new arrival to discover the identities of the other
- conference participants. The average period should be chosen such that no
- more than a small fraction (say, 1%) of the media bandwidth is consumed by
- RTCP messages from all sources, with a minimum period of a few seconds.
- By scaling the message frequency with the (slowly increasing) number of
- observed participants, a new conference participant will quickly inform all
- other participants of its arrival and then slow its announcement rate.
-
-
- 6 Address of Author
-
-
- Henning Schulzrinne
- AT&T Bell Laboratories
- MH 2A244
- 600 Mountain Avenue
- Murray Hill, NJ 07974-0636
- telephone: +1 908 582 2262
- facsimile: +1 908 582 5809
- electronic mail: hgs@research.att.com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- H. Schulzrinne Expires 12/31/93 [Page 10]
-